45 research outputs found

    Federating Queries to RDF repositories

    Get PDF
    Currently large amounts of RDF data are being published in the Web. These data is commonly accessed by means of SPARQL endpoints. However to query a set of SPARQL endpoints new mechanisms are needed due to neither the SPARQL protocol nor the language provide any norms or guidelines about how to proceed. In this paper we present an approach for federating queries to a set of SPARQL endpoints, using relational database distributed query processing techniques and part of the WS-DAI specification for web-service based access to relational and XML databases

    Federated Query Processing for the Semantic Web

    Get PDF
    The recent years have witnessed a constant growth in the amount of RDF data available on the Web. This growth is largely based on the increasing rate of data publication on the Web by different actors such governments, life science researchers or geographical institutes. RDF data generation is mainly done by converting already existing legacy data resources into RDF (e.g. converting data stored in relational databases into RDF), but also by creating that RDF data directly (e.g. sensors). These RDF data are normally exposed by means of Linked Data-enabled URIs and SPARQL endpoints. Given the sustained growth that we are experiencing in the number of SPARQL endpoints available, the need to be able to send federated SPARQL queries across them has also grown. Tools for accessing sets of RDF data repositories are starting to appear, differing between them on the way in which they allow users to access these data (allowing users to specify directly what RDF data set they want to query, or making this process transparent to them). To overcome this heterogeneity in federated query processing solutions, the W3C SPARQL working group is defining a federation extension for SPARQL 1.1, which allows combining in a single query, graph patterns that can be evaluated in several endpoints. In this PhD thesis, we describe the syntax of that SPARQL extension for providing access to distributed RDF data sets and formalise its semantics. We adapt existing techniques for distributed data access in relational databases in order to deal with SPARQL endpoints, which we have implemented in our federation query evaluation system (SPARQL-DQP). We describe the static optimisation techniques that we implemented in our system and we carry out a series of experiments that show that our optimisations significantly speed up the query evaluation process in presence of large query results and optional operator

    Semantics and Optimization of the SPARQL 1.1 Federation Extension

    Get PDF
    The W3C SPARQL working group is defining the new SPARQL 1.1 query language. The current working draft of SPARQL 1.1 focuses mainly on the description of the language. In this paper, we provide a formalization of the syntax and semantics of the SPARQL 1.1 federation extension, an important fragment of the language that has not yet received much attention. Besides, we propose optimization techniques for this fragment, provide an implementation of the fragment including these techniques, and carry out a series of experiments that show that our optimization procedures could significantly speed up the query evaluation process

    Situated Support for Choice of Representations

    Get PDF
    As more and more companies are augmenting their data to include semantics it is imperative that the choices made when choosing the modelling language are well founded in knowledge about the language and the domain in question. This work demonstrates how the Semiotic Quality Framework can facilitate the choice of the most suited language for a real world application. Computational and situated features are introduced as an extension to the framework

    Federating queries in SPARQL 1.1: syntax, semantics and evaluation

    Full text link
    Given the sustained growth that we are experiencing in the number of SPARQL endpoints available, the need to be able to send federated SPARQL queries across these has also grown. To address this use case, the W3C SPARQL working group is defining a federation extension for SPARQL 1.1 which allows for combining graph patterns that can be evaluated over several endpoints within a single query. In this paper, we describe the syntax of that extension and formalize its semantics. Additionally, we describe how a query evaluation system can be implemented for that federation extension, describing some static optimization techniques and reusing a query engine used for data-intensive science, so as to deal with large amounts of intermediate and final results. Finally we carry out a series of experiments that show that our optimizations speed up the federated query evaluation process

    The Odyssey Approach for Optimizing Federated SPARQL Queries

    Full text link
    Answering queries over a federation of SPARQL endpoints requires combining data from more than one data source. Optimizing queries in such scenarios is particularly challenging not only because of (i) the large variety of possible query execution plans that correctly answer the query but also because (ii) there is only limited access to statistics about schema and instance data of remote sources. To overcome these challenges, most federated query engines rely on heuristics to reduce the space of possible query execution plans or on dynamic programming strategies to produce optimal plans. Nevertheless, these plans may still exhibit a high number of intermediate results or high execution times because of heuristics and inaccurate cost estimations. In this paper, we present Odyssey, an approach that uses statistics that allow for a more accurate cost estimation for federated queries and therefore enables Odyssey to produce better query execution plans. Our experimental results show that Odyssey produces query execution plans that are better in terms of data transfer and execution time than state-of-the-art optimizers. Our experiments using the FedBench benchmark show execution time gains of at least 25 times on average.Comment: 16 pages, 10 figure

    WEASEL: Vodafone Corporate Semantic Web

    Full text link
    The 2006 Gartner emerging technology curve highlights the relevance of the Corporate Semantic Web as one of the most promising IT areas in the next five years. The work presented herein describes WEASEL, an initiative funded by Vodafone to apply and evaluate such technology in the context of a large multinational company. This scenario comprises a number of heterogeneous web sites containing unstructured and related, but physically decoupled, information which needs common models that provide unified ways of representing information across the different sources, i.e. ontologies. Three main milestones were defined for WEASEL: the creation of a domain ontology, the extraction of information from the different sources and its semantic annotation and aggregation, and the creation of a new web site containing a semantic search engine which provides natural interfaces for retrieving the aggregated information. WEASEL concluded with its evaluation by Vodafone

    Strategies for executing federated queries in SPARQL1.1

    Get PDF
    A common way for exposing RDF data on the Web is by means of SPARQL endpoints which allow end users and applications to query just the RDF data they want. However, servers hosting SPARQL endpoints often restrict access to the data by limiting the amount of results returned per query or the amount of queries per time that a client may issue. As this may affect query completeness when using SPARQL1.1's federated query extension, we analysed different strategies to implement federated queries with the goal to circumvent endpoint limits. We show that some seemingly intuitive methods for decomposing federated queries provide unsound results in the general case, and provide fixes or discuss under which restrictions these recipes are still applicable. Finally, we evaluate the proposed strategies for checking their feasibility in practice

    Moving real-time linked data query evaluation to the client

    Get PDF
    Traditional RDF stream processing engines work completely server-side, which contributes to a high server cost. For allowing a large number of concurrent clients to do continuous querying, we extend the low-cost Triple Pattern Fragments (TPF) interface with support for timesensitive queries. In this poster, we give the overview of a client-side rdf stream processing engine on top of tpf. Our experiments show that our solution significantly lowers the server load while increasing the load on the clients. Preliminary results indicate that our solution moves the complexity of continuously evaluating real-time queries from the server to the client, which makes real-time querying much more scalable for a large amount of concurrent clients when compared to the alternatives

    Co-evolution of RDF Datasets

    Get PDF
    Linking Data initiatives have fostered the publication of large number of RDF datasets in the Linked Open Data (LOD) cloud, as well as the development of query processing infrastructures to access these data in a federated fashion. However, different experimental studies have shown that availability of LOD datasets cannot be always ensured, being RDF data replication required for envisioning reliable federated query frameworks. Albeit enhancing data availability, RDF data replication requires synchronization and conflict resolution when replicas and source datasets are allowed to change data over time, i.e., co-evolution management needs to be provided to ensure consistency. In this paper, we tackle the problem of RDF data co-evolution and devise an approach for conflict resolution during co-evolution of RDF datasets. Our proposed approach is property-oriented and allows for exploiting semantics about RDF properties during co-evolution management. The quality of our approach is empirically evaluated in different scenarios on the DBpedia-live dataset. Experimental results suggest that proposed proposed techniques have a positive impact on the quality of data in source datasets and replicas.Comment: 18 pages, 4 figures, Accepted in ICWE, 201
    corecore